1. Field of the Art
The present invention generally relates to the field of electronic data analysis, and more specifically, to the formation of specialized dictionaries for a given document corpus of electronic documents.
2. Description of the Related Art
A number of organizations manually create dictionaries of conventional words or phrases considered to be standard for the languages in which the dictionaries are written. However, such conventional dictionaries often do not include a number of specialized phrases of interest in particular contexts, such as fictional phrases (e.g., names and terms defined in a series of books of fiction, e.g., the term “Quidditch”), terminology specific to particular contexts such as nautical or legal terms, or other types of domain-specific phrases. Although certain fragmented and non-comprehensive attempts may have been created through manual efforts, these techniques are not suitable for all domains.